Goto

Collaborating Authors

 Voting & Elections


ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Neural Information Processing Systems

Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes remains challenging, especially for complex multi-hop questions requiring multiple retrieval steps. We propose ReSearch, a novel framework that trains LLMs to Reason with Search via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. We train ReSearch on Qwen2.5-7B(-Instruct) and Qwen2.5-32B(-Instruct)


Democratic Socialist Leads in D.C. Mayor Race--Furthering Breakout Year For Left

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


Safety Pretraining: Toward the Next Generation of Safe AI

Neural Information Processing Systems

As large language models (LLMs) are increasingly deployed in high-stakes settings, the risk of generating harmful or toxic content remains a central challenge. Post-hoc alignment methods are brittle: once unsafe patterns are learned during pretraining, they are hard to remove. In this work, we present a data-centric pretraining framework that builds safety into the model from the start. Our framework consists of four key steps: (i) Safety Filtering: building a safety classifier to classify webdata into safe and unsafe categories; (ii) Safety Rephrasing: we recontextualize unsafe webdata into safer narratives; (iii) Native Refusal: we synthetically generate pretraining datasets that actively teach models to refuse on unsafe content and the moral reasoning behind it, and (iv) Harmfulness-Tag annotated pretraining: we flag unsafe content during pretraining using a special token, and use it to steer models away from unsafe generations at inference-time. Our safety-pretrained models reduce attack success rates from 38.8% to 8.4% on standard LLM safety benchmarks with no performance degradation on general tasks.


Turkey's Erdogan Is Running Out of Tricks

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


Tight Bounds On The Distortion of Randomized and Deterministic Distributed Voting

Neural Information Processing Systems

We study metric distortion in distributed voting, where nvoters are partitioned into k groups, each selecting a local representative, and a final winner is chosen from these representatives (or from the entire set of candidates). This setting models systems like U.S. presidential elections, where state-level decisions determine the national outcome. We focus on four cost objectives from Anshelevich et al. [1]: avg-avg, avg-max, max-avg, and max-max. We present improved distortion bounds for both deterministic and randomized mechanisms, offering a near-complete characterization of distortion in this model. For deterministic mechanisms, we reduce the upper bound for avg-max from 11 to 7, establish a tight lower bound of 5 for max-avg (improving on 2+ 5), and tighten the upper bound for max-max from 5 to 3. For randomized mechanisms, we consider two settings: (i) only the second stage is randomized, and (ii) both stages may be randomized. In case (i), we prove tight bounds: 5 2/k for avg-avg, 3for avg-max and max-max, and 5for max-avg. In case (ii), we show tight bounds of 3 for max-avg and max-max, and nearly tight bounds for avg-avg and avg-max within [3 2/n, 3 2/(kn)]and [3 2/n, 3], respectively, where n denotes the largest group size.


What's Going On in Donald Trump's Head? We Don't Have Brain Scans. We Do Have This.

Slate

No one can say for sure what's going on in the president's head. His 25 greatest obsessions can get us a little closer. This is the year the first baby boomers--those born in 1946--turn 80, and that cohort includes Donald Trump. We have all recently lived through what it means to have an 80-year-old commander in chief, but at a political moment that's simultaneously more horrific, erratic, and just plain befuddling than anything this country has seen in ages, we wanted to understand the brain of 80-year-old president. Plenty of people are trying to discern whether his recent rants and raves are due to a more serious cognitive decline--we understand the instinct; we've done it too --but we went a different (if related) route. The more we dug into Trump's many fixations, the more we realized that this man still thinks he lives in the 1980s. We also discovered--without too much surprise--that he often seems to fundamentally misunderstand the works he treasures most deeply. These items might not replace a brain map, but they do create a certain holistic view of what animates and splinters Trump's mind. Sometimes, they just help explain his worldview. Other times, they seem to have had real influence on policy and the America that Trump is trying to create. Welcome to Trump Brain, the 25 things that define who the president is--and what he wants. Please enable javascript to fully experience this interactive. When millions of people took to the streets in October to protest Trump's authoritarianism, the president responded by dunking on his critics online. Specifically, he posted an A.I.-generated video of a fighter jet, piloted by himself in a literal crown, dropping human excrement onto the crowds. It was perhaps Trump's most juvenile use of A.I. slop yet--the kind of low-quality, feverish content made possible by artificial intelligence. Trump undoubtedly is the perfect president for the A.I. slop era. In some ways, this is because he's the ideal audience for it: Like many older internet users delighted by the technology, Trump seems to enjoy mindless, cartoonish, childish content. One of the videos he shared depicted him playing soccer with Cristiano Ronaldo in the Oval Office.


Meet Nithya Raman, the Progressive Democrat Who Secured the Second Spot in LA Mayor Race

TIME - Tech

Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW?


Lopez: As Compton students ace tests, educators are baffled by Rep. Maxine Waters' snub of school bond

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. As Compton students ace tests, educators are baffled by Rep. Maxine Waters' snub of school bond Students walk on campus at Dominguez High School in Compton. A bond measure would provide millions of dollars to rebuild the school. This is read by an automated voice. Please report any issues or inconsistencies here .


Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

arXiv.org Machine Learning

Many applications require statistically valid inference across many related "tasks", while using only a handful of high-quality labels per hypothesis. In AI evaluation, these tasks may correspond to model behaviors across prompts, subgroups, or hypotheses; in social science surveys, they may correspond to related questions, populations, or measurement conditions. Prediction-powered inference (PPI) uses abundant but inexpensive proxy measurements to improve inference from limited, "ground-truth" labels, but commonly used methods treat tasks independently and therefore fail to exploit shared structure across related tasks. This limitation is especially important in settings where only a small number of labels are available per task. To address this issue, we introduce a multi-task prediction-powered inference framework that uses labeled data from related tasks to improve power while preserving task-specific inference. Our methods exploit the shared structure in the proxy-ground-truth relationship through cross-task recalibration, while retaining within-task rectification and power tuning to construct accurate point estimates and confidence intervals. We prove that efficiency gains beyond power-tuned PPI are only possible when the proxy-ground-truth relationship contains nonlinear structure; affine cross-task recalibrations are asymptotically equivalent to using the original proxy. We complement our theoretical findings with experiments on synthetic and semi-synthetic datasets, as well as a case study auditing language models on election-related information during the 2024 U.S. presidential election. Using a large human-annotation study, we show that cross-task recalibration can substantially reduce confidence interval widths when labels are scarce.


The GOP's Attacks on James Talarico Are Straight Out of the Incel Handbook

WIRED

The GOP's Attacks on James Talarico Are Straight Out of the Incel Handbook Claims about low testosterone and false accusations of veganism might play well to the online far right, but will they win an election? Democratic US Senate candidate James Talarico speaks in Houston, Texas. On Tuesday, with Donald Trump's endorsement and the backing of the MAGA faithful, scandal-ridden Texas attorney general Ken Paxton defeated incumbent US senator John Cornyn in a runoff primary to claim the Republican nomination for that seat. He then quickly set about painting his general-election opponent, Democratic Texas state representative James Talarico, as insufficiently masculine. "My opponent is the most extreme radical that Democrats have ever nominated," Paxton said in his victory speech.